Achieving a reliable compact acoustic model for embedded speech recognition system with high confusion frequency model handling
نویسندگان
چکیده
An acoustic model for an embedded speech recognition system must exhibit two desirable features; the ability to minimize the performance degradation in recognition, while solving the memory problem under the constraint of limited system resources. Moreover, for general speech recognition tasks, context dependent models such as state-clustered tri-phones are used to guarantee the high recognition performance of the embedded system. To cope with these challenges, we introduce the state-clustered tied-mixture (SCTM) HMM as a method of optimizing an acoustic model. The proposed SCTM modeling system offers a significant improvement in recognition performance, as well as providing a solution to sparse training data problems. Moreover, the state weight quantizing method achieves a drastic reduction in the size of the model. However, using models constructed only in this way is insufficient to improve the recognition rate in some tasks where a large mutual similarity exists, such as in the case of the Korean-digit recognition task. Hence, we also construct new dedicated HMM’s for all or part of the Korean-digits that have exclusive states using the same Gaussian pool of previous tri-phone models. In this paper, we describe the acoustic model optimization procedure for embedded speech recognition systems and the corresponding performance evaluation results.
منابع مشابه
Development of a Mandarin-English Bilingual Speech Recognition System with Unified Acoustic Models
This paper presents our recent work on the development of a grammar-constrained, Mandarin-English bilingual Speech Recognition System (MESRS) for real-world music retrieval. Two of the main difficult issues in handling the bilingual speech recognition for realworld applications are tackled: One is to balance the performance and the complexity of the bilingual speech recognition system; the othe...
متن کاملLarge vocabulary Speech Recognition System: SPOJUS++
In this paper, we describe Large vocabulary Continuous Speech Recognition (LVCSR) system SPOJUS++ which has been developed in our laboratory for over 20 years and recently fully reimplemented from scratch. SPOJUS++ employs a context-dependent Hidden Markov Model (HMM) as an acoustic model and an N-gram model as a language model to decode speech. SPOJUS++ has many novel features including a dyna...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملCompact acoustic model for embedded implementation
An acoustic model for an embedded speech recognition system must exhibit two desirable features; ability to minimize performance degradation in recognition while solving the memory problem under limited system resources. To cope with the challenges, we introduce the state-clustered tied-mixture (SCTM) HMM as an acoustic model optimization. The proposed SCTM modeling shows a significant improvem...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Speech Communication
دوره 48 شماره
صفحات -
تاریخ انتشار 2006